The setup files include twitterdatacollect.py, create_dataframe.py and load_pickle.py.
The script twitterdatacollect.py collects tweets made by Donald Trump and Elon Musk and their replies using Tweepy and then stores it in a Mongodb Database. The created Mongodb dump is present in the dump directory.
In the script, create_dataframe.py, I have fetched only the required data from the database, such as the tweets, their date of creation, location etc. Using that I created the following dataframes:
I stored the dataframes using pickle so that their values can be quickly refrenced inside any file as they would be integral to all the analysis that has to be done.
load_pickle.py loads those dataframes.
For setup: You can simply use
from load_pickle import df_trump_tweets as tt, df_trump_replies as tr, df_elon_tweets as et, df_elon_replies as er in any file where you would like to access the dataframes in.
To collect twitter data, run twitterdatacollect.py
For sentiment analysis, I used textblob which is a popular python library, used for processing data. TextBlob (apart from it's many other cool NLP functions) gives an accurate measure of polarity and subjectivity of any text. A polarity < 0 indicates that the sentiment expressed is negative, where as a polarity > 0 indicates that the sentiment is positive. Polarity = 0 represents neutral sentiment.
Subjectivity also varies from 0 to 1.
The library used for most of the plotting is plotly. Plotly offers versatality and interactive graphs on a jupyter notebook and is also easy to use. It has some really amazing features like display of values on hover (so you dont need to look at the axis anymore). You can also click on the legends to make the respective graphs appear and disappear. For example: on the scatter plots, simply click on the legend to see only one plot. It makes analysis a lot more easier!
As for the code part:
# #Function to get processed tweets and their polarity from create_dataframes.py file
# def clean_tweet(tweet):
# return ' '.join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
# #taken from https://gist.github.com/eledroos/efbe501f359d9791019b19e9ea9d60b6
# def add_analysis(df):
# clean_tweets_list=[]
# subjectivity_list=[]
# polarity_list=[]
# for tweet in df['text']:
# tweetnew=clean_tweet(str(tweet)) #clean the tweet
# clean_tweets_list.append(tweetnew) #Append the clean tweet to a list
# analysis=TextBlob(tweetnew) #Analyse the sentiment of the clean tweet
# subjectivity_list.append(analysis.sentiment.subjectivity)
# polarity_list.append(analysis.sentiment.polarity)
# df_new = pd.DataFrame({'id_str': df['id_str'],'processed_tweet': clean_tweets_list,'subjectivity': subjectivity_list,'polarity': polarity_list})
# return pd.DataFrame.merge(df,df_new)
import re
from textblob import TextBlob
from load_pickle import df_trump_tweets as tt, df_trump_replies as tr, df_elon_tweets as et, df_elon_replies as er
import plotly.plotly as py
import plotly.graph_objs as go
#list_collections = ['trump_tweets','trump_replies','elon_tweets','elon_replies']
Plotly uses API calls to plot graph, the maximum limit is around 20-25 graphs. You can create your API keys from or you can uncomment any of the lines below.
import plotly
#plotly.tools.set_credentials_file(username='sonali18317', api_key='AztkaqxkMkq2bQ1n6rL2') #Already used
#plotly.tools.set_credentials_file(username='yopechenga', api_key='5vh9jlqPkXatKDOBmTQp') #Already used
#plotly.tools.set_credentials_file(username='bopechenga', api_key='f8ATeyYGMXsO7ANHQnVT') #Already used
plotly.tools.set_credentials_file(username='bope_chenga', api_key='Hqutc0MltLQoLyX9ol39')
#plotly.tools.set_credentials_file(username='bope.chenga', api_key='ubb9JG40ceoU9yhgdNHq')
def plot_sentiment(d1,d2,color1,color2,title):
trace0 = go.Scatter(
x = d1['polarity'],
y = d1['subjectivity'],
name = 'Tweets',
mode = 'markers',
marker = dict(
size = 10,
color=color1,
line = dict(
width = 2,
color = 'rgb(0, 0, 0)'
)
)
)
trace1 = go.Scatter(
x = d2['polarity'],
y = d2['subjectivity'],
name = 'Replies',
mode = 'markers',
marker = dict(
size = 10,
color = color2,
line = dict(
width = 2,
)
)
)
data = [trace0, trace1]
layout = dict(title = title,
yaxis = dict(zeroline = False, title='Subjectivity'),
xaxis = dict(zeroline = False, title = 'Polarity'),
)
fig = go.Figure(data=data, layout=layout)
return fig
fig=plot_sentiment(tt,tr,'rgba(0, 151, 102, 1)','rgba(255, 182, 193, .8)','Sentiment analysis of Trump Tweets')
py.iplot(fig)
fig=plot_sentiment(et,er,'rgba(0, 104, 0, 1)','rgba(204,215, 10, .8)','Sentiment analysis of Elon Musk Tweets')
py.iplot(fig)
For a more comprehensive analysis, I have used bar graphs, which are simple and easy to understand. I have plotted the positivity, negativity and neutrality percentage and compared them for Tweets and replies. You can also click on the legends to see the polarity percentage of either one individually. The graph also shows that the positivity sentiment of tweets were much greater than their replies.
df=[tt['polarity'],tr['polarity'],et['polarity'],er['polarity']]
sentiment=[]
for d in df:
p=0
n=0
r=0
for i in d:
if i>0:
p+=1
elif i<0:
n+=1
else:
r+=1
t=n+r+p
sentiment.append([p/t*100,n/t*100,r/t*100])
trace1 = go.Bar(
x=['Positive', 'Neutral', 'Negative'],
y=[sentiment[0][0], sentiment[0][2], sentiment[0][1]],
name='Trump Tweets'
)
trace2 = go.Bar(
x=['Positive', 'Neutral', 'Negative'],
y=[sentiment[1][0], sentiment[1][2], sentiment[1][1]],
name='Replies to Trump Tweets'
)
data = [trace1, trace2]
layout = go.Layout(title = 'Sentiment of Trump tweets',
barmode='group'
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
A similar analysis for Elon Musk tweets
trace1 = go.Bar(
x=['Positive', 'Neutral', 'Negative'],
y=[sentiment[2][0], sentiment[2][2], sentiment[2][1]],
name='Elon Musk Tweets',
marker=dict(color='rgb(144, 12, 63,1)'),
)
trace2 = go.Bar(
x=['Positive', 'Neutral', 'Negative'],
y=[sentiment[3][0], sentiment[3][2], sentiment[3][1]],
name='Replies to ELon Tweets',
marker=dict(color='rgb(255, 195, 0,1)'),
)
data = [trace1, trace2]
layout = go.Layout(title = 'Sentiment of Elon Musk tweets',
barmode='group'
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
I thought the best way to plot the content would be a word cloud since it gives a nice idea about the content of the tweet in a very attractive way.
The library used for this purpose are wordcloud and matplotlib.
For the analysis, I simply used the processed_tweets created in the createdataframes.py
Function maketext uses the text from the tweets and creates a huge string from it. It is analysed using wordcloud and plotted using matplotlib.
from load_pickle import df_trump_tweets as tt, df_trump_replies as tr, df_elon_tweets as et, df_elon_replies as er
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
%matplotlib inline
tweetListtrump=list(tt['processed_tweet'])+list(tr['processed_tweet'])
tweetlistelon=list(et['processed_tweet'])+list(er['processed_tweet'])
stopwords = set(STOPWORDS)
stopwords.update(["REALDONALDTRUMP",'WILL','ELONMUSK'])
#removed realdonaldtrump as it occurs in every tweet and hence redundant for analysis
def make_text(l):
text = ' '
for val in l:
# separate words
tokens = val.split()
# Converts every word into UPPERCASE
for i in range(len(tokens)):
tokens[i] = tokens[i].upper()
for words in tokens:
text = text + words + ' '
return text
def createAndShowWordCloud(tweetList):
text1=make_text(tweetList)
wordcloud = WordCloud(background_color ='white', stopwords = stopwords, collocations=False, min_font_size = 1,width=1600, height=800).generate(text1)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.rcParams['figure.figsize'] = [200, 50]
plt.show()
The wordcloud shows valuable information. For example, the most prominant topics of Trump tweets are Fox news, people and Bush which matches with his recently posted tweets and replies. For example, a post by Donald Trump: 'Looking forward to being with the Bush family. This is not a funeral, this is a day of celebration for a great man who has led a long and distinguished life. He will be missed!'
createAndShowWordCloud(tweetListtrump)
createAndShowWordCloud(tweetlistelon)
#ElonMusk Tweets